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Abstract 

Motivated by recently discovered privacy attacks 
on social networks, we study the problem of 
anonymizing the underlying graph of interac- 
tions in a social network. We call a graph (k,£)- 
anonymous if for every node in the graph there 
exist at least k other nodes that share at least I 
of its neighbors. We consider two combinatorial 
problems arising from this notion of anonymity 
in graphs. More specifically, given an input 
graph we ask for the minimum number of edges 
to be added so that the graph becomes (k,£)- 
anonymous. We define two variants of this mini- 
mization problem and study their properties. We 
show that for certain values of k and £ the prob- 
lems are polynomial-time solvable, while for oth- 
ers they become NP-hard. Approximation algo- 
rithms for the latter cases are also given. 



1 Introduction 

The popularity of online communities and social 
networks in recent years has motivated research 
on social-network analysis. Though these stud- 
ies are useful in uncovering the underpinnings of 
human social behavior, they also raise privacy 
concerns for the individuals involved. 

A social network is usually represented 
graph, where nodes correspond to individuals 
and edges capture relationships between these 
individuals. For example, in Linkedln, an on- 
line network of professionals, every link between 
two users specifies a professional relationship be- 
tween them. In Facebook and Orkut links cor- 
respond to friendships. There are online com- 
munities that permit any user to access the in- 
formation of every node in the graph and view 
its neighbors. However, many communities are 



increasingly restricting access to the personal 
information of other users. For example, in 
Linkedln, a user can only see the profiles of his 
own friends and their connections. 

In this paper, we consider a scenario where the 
owner of a social network would like to release 
the underlying graph of interactions for social- 
network analysis purposes, while preserving the 
privacy of its users. More specifically, the pri- 
vate information to be protected is the mapping 
of nodes to real-world entities and interconnec- 
tions amongst them. Therefore, we design an 
anonymization framework that tries to hide the 
identity of nodes by creating groups of nodes 
that look similar by virtue of sharing many of the 
same neighbors. We call such nodes anonymized. 
Our goal is to anonymize all nodes of the graph 
by introducing minimal changes to the overall 
graph structure. In this way we can guaran- 
tee that the anonymized graph is still useful for 
social-network analysis purposes. 

Recently, Backstrom et. al. [3] have shown 
that the most simple graph-anonymization tech- 
nique that removes the identity of each node in 
the graph, replacing it with a random identifi- 
cation number instead, is not adequate for pre- 
serving the privacy of nodes. Specifically, they 
show that in such an anonymized network, there 
exists an adversary who can identify target in- 
dividuals and the link structure between them. 
However, the problem of designing anonymiza- 
tion methods against such adversaries is not ad- 
dressed in [3J. 

Following the work of [4j, Hay et. al. [7] 
have very recently given a definition of graph 
anonymity: a graph is fc-anonymous if every 
node shares the same neighborhood structure 
with at least k — 1 other nodes. The definition is 
recursive, and has some nice properties studied 
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in [7J. However, the focus of [7J is mostly on the 
properties of the definitions rather than on algo- 
rithms to achieve the anonymity requirements. 

Motivated by @] and [7J, Zhou and Pei [H] 
consider the following definition of anonymity in 
graphs: a graph is /c-anonymous if for every node 
there exist at least k — 1 other nodes that share 
isomorphic 1-neighborhoods. They consider the 
problem of minimum graph-modifications (in 
terms of edge additions) that would lead to 
a graph satisfying the anonymity requirement. 
Although this definition is interesting, the al- 
gorithm presented in [18] is not supported by 
theoretical analysis. Further, if the anonymity 
definition is extended to consider the neighbor- 
hood structure beyond just the immediate 1- 
neighborhood of each node, algorithmic tech- 
niques quickly become infeasible. 

Despite the fact that privacy concerns in 
releasing social-network data have been pin- 
pointed, there is no agreement on the definition 
of privacy or anonymity that should be used for 
such data. In this paper, we try to move this 
line of research one step forward by proposing a 
new definition of graph anonymity that is inline 
to a certain extent with the definitions provided 
in [7J. Our definition of anonymity is in a sense 
less strict than the one proposed in [18]. How- 
ever, we consider it to be natural, intuitive and 
more amenable to theoretical analysis. 

Intuitively our definition aims to protect an 
individual from an adversary who knows some 
subset of the individual's neighbors in the graph. 
After anonymization, the hope is that the ad- 
versary can no longer identify the target indi- 
vidual because several other nodes in the graph 
will also share this subset of neighbors. Further, 
during anonymization, the identifying subset of 
neighbors themselves will become distorted and 
harder for the adversary to identify. 

The Problem: We define a graph to be (k, £)- 
anonymous if for every node u in the graph there 
exist at least k other nodes that share at least 
£ of their neighbors with it. In order to meet 
this anonymity requirement one could transform 
any graph into a complete graph. For a graph 
consisting of n nodes this would mean that every 
node would share n—2 neighbors with each of the 



n — 1 other nodes. Although such an anonymiza- 
tion would preserve privacy, it would make the 
anonymized graph useless for any study. For 
this reason we impose the additional requirement 
that the minimum number of such edge additions 
should be made. The aim is to preserve the util- 
ity of the original graph, while at the same time 
satisfying the (k, ^)-anonymity constraint. 

Given k and £ we formally define two vari- 
ants of the graph- anonymization problem that 
ask for the minimum number of edge additions 
to be made so that the resulting graph is (k, £)- 
anonymous. We show that for certain values of 
k and £ the problems are polynomial-time solv- 
able, while for others they are NP-hard. We also 
present simple and intuitive approximation algo- 
rithms for these hard instances. To summarize 
our contributions: 

• We propose a new definition of graph 
anonymity building on previously proposed 
definitions. 

• We provide the first formal algorithmic treat- 
ment of the graph-anonymization problem. 

Besides graph anonymization, the combinato- 
rial problems we study here may also arise in 
other domains, e.g., graph reliability. We there- 
fore believe that the problem definitions and al- 
gorithms we present are of independent interest. 

Roadmap: The rest of the paper is organized as 
follows. In Section [2] we summarize the related 
work. Section [3] gives the necessary notation 
and definitions. Algorithms and hardness results 
for different instances of the (k, £)-anonymization 
problem are given in Sections SI E] and [7J We 
conclude in Section [HJ 

2 Related Work 

As mentioned in the Introduction, there has been 
some prior work on privacy-preserving releases of 
social-network graphs. The authors in [3] show 
that the naive approach of simply masking user- 
names is not sufficient anonymization. In par- 
ticular, they show that, if an adversary is given 
the chance to create as few as 0(log(n)) new ac- 
counts in the network, prior to its release, then 
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he can efficiently recover the structure of con- 
nections between any 0(log 2 (ra)) nodes chosen 
apriori. He can do so by identifying the new ac- 
counts that he inserted in to the network. The 
focus of [4] is on revealing the power of such ad- 
versaries and not on devising methods to protect 
against them. 

In [7] the authors experimentally evaluate how 
much background information about the neigh- 
borhood of an individual would be sufficient for 
an adversary to uniquely identify that individual 
in a naively anonymized graph. Additionally, 
a new recursive definition of graph anonymity 
is given. The definition says that a graph is 
/c-anonymous if for every structure query there 
exist k nodes that satisfy it. The definition 
is constructed for a certain class of structure 
queries that query the neighborhood structure 
of the nodes. Our definition of anonymity is in- 
spired by [7], however it is substantially different. 
Moreover, the focus of our work is on the com- 
binatorial problems arising from our anonymity 
definition. 

Very recently, the authors of [TS] consider yet 
another definition of graph anonymity; a graph 
is fc-anonymous if for every node there exist at 
least k — 1 other nodes that share isomorphic 1- 
neighborhoods. This definition of anonymity in 
graphs is different from ours. In a sense it is a 
more strict one. Moreover, though the algorithm 
presented in [18] seems to work well in practice, 
no theoretical analysis of its performance is pre- 
sented. Finally, extending the privacy definition 
to more than just the 1-neighborhood of nodes 
causes the algorithms of |18j to quickly become 
infeasible. 

The problem of protecting sensitive links be- 
tween individuals in an anonymized social net- 
work is considered in [T7]. Simple edge-deletion 
and node-merging algorithms are proposed to re- 
duce the risk of sensitive link disclosure. This 
work is different from ours in that we are pri- 
marily interested in protecting the identity of the 
individuals while in [T7j the emphasis is on pro- 
tecting the types of links associated with individ- 
uals. Also, the combinatorial problems that we 
need to solve in our framework are very different 
from the set of problems discussed in [17] . 



In [6] the authors study the problem of as- 
sembling pieces of a graph owned by different 
parties privately. They propose a set of crypto- 
graphic protocols that allow a group of author- 
ities to jointly reconstruct a graph without re- 
vealing the identity of the nodes. The graph thus 
constructed is isomorphic to a perturbed version 
of the original graph. The perturbation consists 
of addition and or deletion of nodes and or edges. 
Unlike that work, we try to anonymize a single 
graph by modifying it as little as possible. More- 
over, our methods are purely combinatorial and 
no cryptographic protocols are involved. 

Korolova et. al. [8] investigate an attack 
where an adversary strategically subverts user 
accounts. He then uses the online interface pro- 
vided by the social network to gain access to lo- 
cal neighborhoods and to piece them together to 
form a global picture. The authors provide rec- 
ommendations on what the lookahead of a social 
network should be to render such attacks infeasi- 
ble. This work does not consider an anonymized 
release of the entire network graph and is thus 
different from ours. 

Besides graphs, there has been considerable 
prior work on anonymizing traditional relational 
data sets. The line of work on /c-anonymity 
found in [D HH El [T21 El [TO] aims to minimally 
suppress or generalize public attributes of indi- 
viduals in a database in such a way that every 
individual (identifiable by his public attributes) 
is hidden in a group of size at least k. Our notion 
of graph anonymity draws inspiration from this. 

Apart from suppression or generalization tech- 
niques, perturbation techniques have also been 
used to anonymize relational data sets in [21 
O [5] . Perturbation-based approaches for graph 
anonymization are also considered in [7J [16j ; in 
that case edges are randomly inserted or deleted 
to anonymize the graph. We do not consider 
perturbation-based approaches in this paper. 

3 Preliminaries 

In this section we formalize our definition of 
graph anonymity and introduce two natural op- 
timization problems that arise from it. 

Throughout the paper we assume that the 
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social- network graph is simple, i.e., it is undi- 
rected, unweighted, and contains no self-loops or 
multi-edges. This is an important category of 
graphs to study; most of the aforementioned so- 
cial networks (Facebook, Linkedln, Orkut) allow 
only bidirectional links and are thus instances 
of such simple graphs. We assume that the ac- 
tual identifiers of individual nodes are removed 
prior to further anonymization. Our definition 
for graph anonymity is inspired by the notion 
of /c-anonymity for relational data wherein each 
person, identifiable by his public attributes, is 
required to be hidden in a group of size k. In 
the case of a social-network graph, the publicly- 
known attributes of a user would be (a subset of) 
his connections (and interconnections amongst 
them) within the graph. 

Consider a simple unlabelled graph and an ad- 
versary who knows that a target individual and 
some number of his friends form a clique. In 
the released graph, the adversary could look for 
such cliques to narrow down the set of nodes that 
might correspond to the target individual. The 
goal of an anonymization scheme is to prevent 
such an adversary from uniquely identifying the 
individual and his remaining connections in the 
anonymized graph. 

We achieve this by introducing an anonymity 
property that requires that for every node in the 
graph, some subset of its neighbors should be 
shared by other nodes. In this way, an adversary 
who knows some subset of the neighbors of a tar- 
get individual and can even pinpoint them in the 
graph, will not be able to distinguish the target 
individual from other nodes in the network that 
share this subset of neighbors. Further, in the 
process of anonymization, the identifying subset 
of neighbors itself becomes distorted and harder 
for the adversary to pinpoint. More formally we 
define the (k, £)-anonymity property as follows. 

Definition 1 ((&, ^-anonymity). A graph G = 
(V,E) is (k, £)-anonymous if for each vertex v £ 
V, there exists a set of vertices U C V not con- 
taining v such that \U\ > k and for each u G U 
the vertices u and v share at least £ neighbors. 



Example 1. A clique ofn nodes is (n- 
anonymous. 



-l,n-2)- 




(a) Input graph G 




(b) (4,l)-anonymous transformation of G 



Figure 1: In Figure 1(a) an adversary can iden- 



tify Alice as the node marked X. Figure 1(b) 



is 



a (4,l)-anonymous transformation of the graph. 

To demonstrate the kinds of attacks we hope 
to protect against, we give another example. 



Example 2. Consider the graph in Figure 1(a) 



Suppose an adversary knows that Alice is in this 
graph and that Alice is connected to a friend who 
is part of a triangle. There is only one such node 
in the graph and hence the adversary will be able 
to determine that the node marked X in the graph 
uniquely corresponds to Alice. From this he may 
be able to further infer the identities of Alice's 
neighbors and their neighbors as well. Now if 
the edges shown in dotted lines in Figure 1(b) are 



added to this graph, the resulting graph is (4, 1)- 
anonymous. In this new graph, Alice is no longer 
the only node connected to a node of a triangle. 
Further, there is no longer only one triangle in 
the graph. 

Given an input graph G = (V, E) with n 
nodes, and integers k and i, our goal is thus 



4 



Clique of 

size k 




(a) Input graph G 



(b) Weakly (k- 1,1) 
- anonymized graph 
G' 



Figure 2: Illustrative example of the difference 
between weak and strong anonymity. 



to transform the graph into a (k, £)-anonymous 
graph. We focus on transformations that al- 
low only additions of edges to the original graph 
In order for the anonymized graph to remain 
useful for social-network (or other) studies, we 
need to ensure that the transformed graph is 
as close as possible to the original graph. We 
achieve this by requiring that a minimum num- 
ber of edges should be added to G so that 
the (k, £)-anonymity property holds. This leads 
us to the following two variants of the (k, £)- 
anonymization problem. 

Problem 1 (Weak (k, £)-anonymization). Given 
a graph G = (V, E) and integers k and I, find the 
minimum number of edges that need to be added 
to E, to obtain a graph G' = (V, E') that is (k, £)- 
anonymous. 

The following example illustrates the weak- 
anonymization problem. 

Example 3. Consider the input graph G of Fig- 
The graph consists of a clique of size 



ure 



2(a) 



k and 2 nodes x and y connected by an edge. 
The nodes in the clique are all (k — 1, k — 2)- 
anonymous. However, the existence of x and y 
prevents G from being fully (k—1, 1)- anonymous. 

Assume now that we connect both x and y to 
a single node u of the clique. In this way, we 
construct graph G' shown in Figure \2(b )\ Obvi- 
ously, G' is (k — 1,1) -anonymous; all the nodes 
in G' (including x and y) have k — 1 other nodes 
that share at least one of their neighbors. For x 
and y, this neighbor is node u. 



The problem in the above example is that 
graph G' satisfies the (k — 1, l)-anonymity re- 
quirement, however, the anonymity of nodes x 
and y is achieved via node u that was not a 
part of their initial set of neighbors in G. Thus, 
the goal of having many other nodes sharing 
the original neighborhood structure of x or y 
is not necessarily achieved unless we place ad- 
ditional requirements on the anonymization pro- 
cedure. To this end we introduce the problem of 
strong anonymization. Strong anonymity places 
additional restrictions on how anonymity can be 
achieved and provides better privacy. 

Definition 2 (Strong (k, ^-transformation). 

Consider graphs G = (V,E) and G' = (V,E'), 
so that E C E' and G' is (k, £) -anonymous. 
For fixed k and i, we say that G' is a strongly- 
anonymized transformation of G, if for every 
vertex v G V , there exists a set of vertices U C V 
not containing v such that \U\ > k and for each 
uGU, \N G (v)nN G ,(u)\ > £. Here N G (v) is the 
set of neighbors of v in G, and N G /(u) is the set 
of neighbors of u in G' . 

Therefore, if a graph G' is a strong (k, £)- 
transformation of graph G, then each vertex in 
G' is required to have k other vertices sharing at 
least I of its original neighbors in G. For this 
to be possible, every vertex must have at least £ 
neighbors in the original graph G to begin with. 

Example 4. Consider again the graph G of 



Figure 2(a) and its transformation to graph G' 
shown in Figure 2(b). In Example^ we showed 
that graph G' is (k — 1, \)-anonymous in the 
weak sense. However, in order to get a strong 
(k—1, 1) -transformation of G, we would have to 
connect each of the nodes x and y to k — 1 other 
nodes from the clique. 

The definition of a strong (k, ^-transformation 
gives rise to the following strong (k,£)- 
anonymization problem. 

Problem 2 (Strong (k, ^-anonymization). 

Given a graph G = (V, E) and integers k and 
£, find the minimum number of edges that need 
to be added to E, to obtain graph G' = (V, E') 
that is a strong (k, £) -transformation of G. 



5 



Obviously achieving strong anonymity would 
require the addition of a larger number of edges 
than weak anonymity. This statement is formal- 
ized as follows. 

Proposition 1. Consider input graph G = 
(V, E) and integers k and I. Let G' = (V, E') 
be the (k, £) -anonymous graph that is the opti- 
mal solution for Problem [7j, and G" = (V, E") 
be the (k,£)- anonymous graph that is the opti- 
mal solution for Problem [H Then it holds that 
\E"\ > \E'\. 

The notion of (k, £)-anonymity is strongly re- 
lated to the immediate neighbors of a node in 
the graph, and how these are shared with other 
nodes. Therefore, for every node u it is impor- 
tant to know the nodes that are reachable from u 
via a path of length exactly 2. Given its impor- 
tance, we define the notion of 2-neighborhood of 
a node as follows. 

Definition 3 (2-neighborhood). Given a graph 
G = (V, E) and a node v £ V we define the 2- 
neighborhood of v to be the set of all nodes in 
G that are reachable from v via paths of length 
exactly 2. 

We also define two more terms that will be 
used in the rest of the paper. 

Definition 4 (Residual Anonymity). Consider 
a graph G = (V, E) that we would like to make 
(k, I) -anonymous. Consider any node v £ V and 
suppose that k' other nodes in the graph share 
at least I of v's neighbors. Then, we define the 
residual anonymity of v to be r(v) = max{k — 
k',0}. The residual anonymity of a graph G = 
(V,E) is defined to be r(G) = YlveV r(v). 

We define the concept of a deficient node for 
nodes that are not (k, £)-anonymous. 

Definition 5 (Deficient Node). A node v is de- 
ficient if r(v) > 0. 

It is the deficient nodes that we need to take 
care of in order to anonymize a graph. With 
these definitions in hand, we are now ready to 
proceed to the technical results of the paper. 



4 (2, l)-anonymization 

In this section we provide polynomial-time 
algorithms for the weak and strong (2, 1)- 
anonymization problems. First, it is easy to see 
that there is a simple characterization of (2, 1)- 
anonymous graphs. This fact is captured in the 
following proposition. 

Proposition 2. A graph G = (V,E) is (2,1)- 
anonymous if and only if each vertex u £ V is 
(a) part of a triangle, (b) adjacent to a vertex of 
degree at least 3, or (c) is the middle vertex in a 
path of 5 vertices. 

The main idea of the algorithms that we de- 
velop for (2, l)-anonymization is that they add 
the minimum number of edges so that every ver- 
tex of the resulting graph satisfies one of the con- 
ditions of Proposition [2J Both algorithms pro- 
ceed in two phases: the deficit- assignment and 
the deficit-matching phase. The deficit assign- 
ment requires a linear scan of the graph in which 
deficits are assigned to vertices. Roughly speak- 
ing, a deficit of 1 signifies that the vertex needs 
to be connected to another vertex of non-zero 
deficit by the addition of an extra edge. This 
added edge ensures that the (2, l)-anonymity re- 
quirement for the vertex or its neighbors will be 
satisfied. Once the deficits are assigned to ver- 
tices the algorithms proceed to the actual addi- 
tion of edges. The edges are added by taking into 
account the deficits of all vertices. For example, 
two vertices both of deficit 1 can be connected 
by the addition of a single edge (if they are not 
already neighbors and are not isolated). In this 
way, a single edge accommodates a total deficit 
of 2. The minimum number of edges to be added 
can be found via a matching of the vertices with 
deficits. The matching consists of edges that are 
not already in the graph. A perfect matching is 
the matching that satisfies all the deficits. In the 
case of weak anonymization, this matching can 
be found in linear time by randomly pairing up 
non-adjacent vertices with deficits. For strong 
anonymization, it needs to be explicitly com- 
puted by solving the maximum-matching prob- 
lem over edges that are not already in the graph. 

Another key point in the development of our 
algorithms is that in order to assign deficits it 
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suffices to explore only vertices that are within a 
distance 4 from some leaf vertex or from a vertex 
of degree 2. Any other vertex can be shown to 
satisfy the conditions of Proposition [2j Finally, 
it only requires a case analysis to show that our 
algorithms optimally assign deficits to vertices, 
independently of the order in which they traverse 
the vertices of the input graph during the first 
phase. For lack of space we only give a sketch of 
the algorithms and proofs in this section. 

4.1 Linear-time weak (2, l)-anonymi- 
zation 

As we have already mentioned our algorithm 
for the weak (2, l)-anonymization problem has 
two phases (1) deficit assignment and (2) deficit 
matching^ 

Deficit Assignment: First assume that the 
input graph has no isolated vertices - we will 
show how to deal with isolated vertices later. 
For the deficit-assignment phase, the algorithm 
starts with an unmarked vertex of degree 1 or 2 
and explores vertices within a distance 4 of it. 
Deficits are assigned as follows: 

• For an isolated edge uv, we assign deficit 1 
to u and deficit 1 to v; it may be that both 
edges will be added at u. 

• For an isolated path uvw, we assign deficit 
1 to v. 

• For an isolated path uvwx, we assign deficit 
1 to v and deficit 1 to w. 

• For a subgraph consisting of a path uvw 
with adjacent vertices attached to ro, we as- 
sign deficit 1 to v. 

• For a component uvXi with vertex u having 
degree one with vertex v connected to a set 
of vertices such that each x € Xi has de- 
gree 1 (and no other vertices) assign deficit 
1 to v. This component corresponds to an 
isolated star centered at v. 

1 Recall that a node u is assigned deficit i if i edges 
need to be added between other non-zero deficit vertices 
and u in order to satisfy the anonymity requirements of 
u or u's neighbors. 



• For a component consisting of a square 
uvwx (isolated square), we assign deficit 1 
to u and deficit 1 to w; it may be that the 
two edges will be added at u and v, or that 
u and w will be joined. 

• For a subgraph consisting of a square uvwx 
with edges (one or more) uxi coming out of 
the square, we assign deficit 1 to v. 

• For a subgraph consisting of squares 
uviwxi, UV2WX2, uvjWXj, we assign 
deficit 1 to one of the i^'s. 

• Finally, for a subgraph consisting of a vertex 
u adjacent to vertices Xi of degree 1 and to 
a vertex y of degree 2, assign deficit 1 to y. 

All the vertices that are visited in this process 
are marked (that is the assigned deficits cover 
all marked vertices) and the deficit-assignment 
process repeats starting with the next unmarked 
vertex until no more unmarked vertices of degree 
1 or 2 remain. 

Deficit Matching: If the number of vertices 
with deficit 1 is 2m, and 2m > 4 or 2m = 2 

- in some case other than an isolated edge uv 

- then, we need to find any perfect matching 
amongst these vertices to find the edges to add. 
The matching of deficits can be done in linear 
time since any (random) pairing of non-adjacent 
vertices with non-zero deficits suffices. In this 
case we add m extra edges. If the number of 
vertices with deficit 1 is 2m + 1, then all but one 
of these vertices can be matched, and a single 
edge needs to be added to the remaining vertex, 
connecting it to some vertex of degree at least 
2. This results in a total of m + 1 extra edges. 
There are, however, some special cases that we 
need to take care of first. 

Special Cases: Before finding the perfect 
matching we match all isolated edges to each 
other. This is because the isolated edges need 
to be connected in a special way to take care of 
the deficits at the two ends. For a pair of isolated 
edges uv and u'v' , we add the edges uu' and vu' 
(we treat the two deficits of 1 at u and v as being 
concentrated at u). In the end we may be left 
with a single isolated edge uv. In this case, two 
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edges need to be added and we can connect them 
to any other vertex in the graph forming a trian- 
gle. Similarly, in the case where the remainder 
is an isolated star centered at v with vertices x\ 
of degree one, it is enough to add a single edge 
to connect vertices Xj and xy of the star. 

Isolated Vertices: It remains to take care of 
isolated vertices. For this we consider a set of six 
isolated vertices u,v,w,u',v',w' and we connect 
them with edges uv,uw,uu', u'v',u'w'. These 
five edges can take care of the six isolated ver- 
tices. In general, the vertices with deficit 1 can 
be attached to isolated vertices first, with two ex- 
ceptions to be considered next. When we have 
an isolated edge xy, one of the two deficits of 1 
can be satisfied by connecting x to an isolated 
vertex, but the other one can also be satisfied 
by connecting x to an isolated vertex u if u is 
also made adjacent to two other isolated vertices 
v and w to obtain the above mentioned compo- 
nent. Similarly if x is only adjacent to vertices 
Ui of degree 1, then the deficit 1 at a; can only be 
matched to an isolated u if u is also made adja- 
cent to two other isolated vertices v and w. In 
the end we will be left with fewer than six iso- 
lated vertices which each need one edge. These 
can be connected to any vertex in the graph of 
degree at least 2. The optimality follows because 
a tree on 5 vertices is optimal saving. 

Theorem 1. The above algorithm solves opti- 
mally the weak (2, l)-anonymization problem in 
linear time. 

Proof Sketch: It requires a case analysis (that 
we omit for lack of space) to show that the 
deficit-assignment scheme we described above is 
complete and optimal and that the total deficit 
assigned is independent of the order in which the 
vertices of the graph are traversed. Since we find 
a perfect matching, we satisfy these deficits with 
as few edges as possible, hence, the optimality of 
the algorithm. 

It is also easy to see that the deficit- assignment 
takes time linear with respect to the number of 
edges in the graph: first we only consider vertices 
of degree one or two as starting points. For every 
such vertex we only have to explore all vertices 
within a distance 4. This is because any other 



vertex can be seen to satisfy one of the condi- 
tions of Proposition [2J After each iteration of the 
deficit assignment, we mark all the vertices that 
have been visited in this process as marked (that 
is the assigned deficits cover all visited vertices). 
The deficit-assignment process continues start- 
ing with the next unmarked vertex of degree 1 
or 2. The scanning of the algorithm requires only 
linear time with respect to the number of edges 
in the graph since every traversed edge connects 
only marked endpoints and thus no edge needs 
to be traversed more than once by the algorithm. 

The deficit-matching phase is also linear since 
it only requires to find any (random) matching 
between non-adjacent deficits. 

4.2 Polynomial-time strong (2, 1)- 
anonymization 

The algorithm for solving the strong (2, 1)- 
anonymization problem is very similar to the 
one presented in the previous section, so we only 
briefly discuss it here. For brevity we avoid men- 
tioning various special cases that are similar to 
the weak-anonymization problem. The first key 
difference is that for strong (2, l)-anonymization 
we need to develop a different deficit-assignment 
scheme. Although the actual structures we have 
to consider for assigning the deficits are the same 
we need to assign different deficits to different 
vertices so that we satisfy the strong anonymity 
requirement. This is because an edge added at 
a vertex with assigned deficit can only help the 
original neighbors of the vertex, and not the ver- 
tex itself. The second difference is that in the 
deficit-matching phase we need to actually solve 
a maximum-matching problem; not every ran- 
dom pairing of non-adjacent vertices with as- 
signed deficit is a valid solution. 

In strong (2, l)-anonymization we first have 
to assume that there are no isolated vertices 
in the input graph G; otherwise strong (2, 1)- 
anonymity is not achievable for these vertices. 

Deficit Assignment: For the deficit- 
assignment step, the algorithm starts with 
an unmarked vertex in the input graph with 
degree 1 or 2 and assigns deficits as follows: 

• For an isolated edge uv, assign deficit of 2 
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at each end. 

• For an isolated path uvwx, put deficit 1 at 
v and at w. 

• For an isolated square uvwx, put deficit 1 
at u and v. 

• If such a square has edges already coming 
out of v, put just deficit 1 at n. 

• If multiple squares uviWXi all start from ver- 
tex u, then assign deficit 1 to one of the Uj's. 

• For a path uvw, put deficit 1 at each of the 
3 vertices. 

• For a vertex of degree at least 3 attached to 
vertices of degree 1, put two deficits of 1 at 
degree 1 vertices. 

• If a path starts uvwx, with x of degree at 
least 2, put deficit 1 at v and 1 at w. 

• If in addition w has other edges coming out 
of it, put deficit 1 just at v. Otherwise if in 
addition only v has other edges coming out 
of it that join to a vertex of degree 1, put 
deficit 1 just at w. 

All vertices that are visited in the process are 
marked, and the algorithm proceeds with the 
next unmarked vertex until there are no un- 
marked vertices left. 

Deficit Matching: For solving the strong (2, 1) 
- anonymization problem exactly we need to 
solve a maximum-matching problem between the 
nodes with deficits. This can be done in polyno- 
mial time (|13j). Note, that in the weak (2, 1)- 
anonymization problem any random pairing of 
non-adjacent nodes with deficits was sufficient, 
allowing for a linear-time matching phase. This 
was because with the exception of isolated edges 
and isolated paths of length 4, there was no case 
in which two vertices of non-zero deficit could 
be adjacent. This is not the case in the strong 
anonymization problem, and here a maximum- 
matching problem needs to be solved over edges 
that are not already in the graph. 

A linear-time deficit-matching algorithm with 
a small additive error can also be developed. 
This is summarized in the following theorem. 



Theorem 2. The strong (2, 1)- anonymization 
problem can be approximated in linear time 
within an additive error of 2, and can be solved 
exactly in polynomial time. 

Proof Sketch: It requires again a case analy- 
sis to show that the deficit-assignment scheme is 
optimal and independent of the order in which 
we traverse the vertices. 

Now, if all deficits add up to m, they can easily 
be paired using a greedy linear-time matching al- 
gorithm. However, the last 2 deficits may be as- 
signed to adjacent vertices. So instead of adding 
[~m/2] edges, we may add [~m/2] +2, for an ad- 
ditive error of 2. If instead we use a maximum- 
matching algorithm to match as many deficits as 
possible and satisfy the unmatched deficits indi- 
vidually, the problem can be solved optimally in 
polynomial time. 

5 From (6, 1) to (7, l)-anonymity 

We show here that given a graph that is al- 
ready (6,l)-anonymous, it is NP-hard to find 
the minimal number of edges that need to be 
added to make it either weakly or strongly (7,1)- 
anonymous. This result provides insight into the 
complexity of the anonymization problem, show- 
ing that it is hard to achieve anonymity even 
incrementally. The result follows from a reduc- 
tion from the l-in-3 satisfiability problem. An 
instance of l-in-3 satisfiability consists of triples 
of Boolean variables (x, y, z) to be assigned val- 
ues or 1 in such a way that each triple contains 
one 1 and two Os. This problem was shown to 
be NP-complete by Schaefer [15] . We first show 
that even a restricted form of the l-in-3 satisfia- 
bility problem is NP-complete. 

Lemma 1. The l-in-3 satisfiability problem is 
NP-complete even if each variable occurs in ex- 
actly 3 triples, no two triples share more than 
one variable, and the total number of triples is 
even. 

Proof. We prove this by taking an arbitrary in- 
stance of the l-in-3 satisfiability problem and 
converting it to an instance satisfying the con- 
straints of the above lemma. We start off by 
renaming multiple occurrences of a variable x as 
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x\, x%, and so on, so that by the end, each vari- 
able occurs in at most 1 triple and no two triples 
share more than one variable. We can then en- 
force the condition that each xi be equal to xi + \ 
by inserting the triples (xi,u,v), (ccj+i, v! , v'), 
(u,u',w) and (v,v',w). This guarantees at most 
3 occurrences of each variable in triples. If a 
variable y occurs in 2 triples, we may include a 
triple (y, z, t) introducing two new variables, so 
that at the end of this process each variable oc- 
curs in either 1 or 3 triples. Finally we make 
nine copies of the entire instance, each labeled 
(i,j) with 1 < i,j < 3, and equate the zs that 
have the same i and also equate the ts that have 
the same j. This guarantees that each variable 
appears in exactly 3 triples. Making two copies 
of this instance guarantees that the number of 
triples is even. □ 

Theorem 3. Suppose G is (6,1) -anonymous. 
Finding the smallest set of edges to add to G 
to solve the weak or strong (7,l)-anonymization 
problem is NP-hard. The same results hold for 
going from (k,l)- anonymity to weak or strong 
(k + 1, l)-anonymity when k > 6. 

Proof. We show this via a reduction from the 1- 
in-3 satisfiability problem. We take an instance 
of the l-in-3 satisfiability problem satisfying the 
constraints of Lemma [TJ We further assume that 
the number of triples in this instance is a mul- 
tiple of 3, since if it is not a multiple of 3, it 
is easy to see that there will be no satisfying as- 
signment. Since we also assume that the number 
of triples is even, the number of triples is in fact 
of the form 6m. 

Taking this instance, we now form a cubic bi- 
partite graph G = (U,V,E) by creating a vertex 
in U for each triple and a vertex in V for each 
variable, with the two vertices connected by an 
edge if the variable occurs in the triple. We add 
5 new neighbors of degree 1 to each vertex in 
U. Each of these added neighbors and the ver- 
tices in V are (7, l)-anonymous, but the vertices 
in U have only 6 vertices at distance 2, namely 
the 2 other neighbors of each of the 3 neighbors 
in V, giving (6, l)-anonymity. We would like to 
increase the anonymity of these vertices so that 
they are also (7, l)-anonymous. Note that a so- 
lution to this anonymity problem has to consist 



of at least m edges. This is because the total 
residual anonymity of the graph is 6m and each 
new edge can reduce the residual anonymity by 
at most 6. Now, if it were possible to select 2m 
vertices in V that were adjacent to all the 6m 
vertices in U, we could insert a perfect matching 
of m edges between these 2m vertices and simul- 
taneously increase the anonymity of all the ver- 
tices in U by at least 1. This would correspond 
to a solution to the l-in-3 satisfiability problem. 
Similarly, if there is a solution to the anonymity 
problem that involves the addition of only m 
edges, it must necessarily correspond to a solu- 
tion to the l-in-3 satisfiability problem. Thus a 
solution to the l-in-3 satisfiability problem ex- 
ists if and only if the solution to the anonymity 
problem involves the addition of m extra edges. 

For k > 6, add k — 2 nodes of degree 1 attached 
to each vertex in U. Attach an additional node 
of degree k — 5 to each vertex in U. Attach the 
remaining k — 6 neighbors of each such additional 
node to a clique of size k + 2. The result then 
follows from the case of k = 6. □ 

The complexity of minimally obtaining weak 
and strong (fc, l)-anonymous graphs remains 
open for k = 3,4, 5, 6. 

6 (k, l)-anonymization 

We start our study for the (k, l)-anonymization 
problem by giving two simple 0(k)- 
approximation algorithms. We then show 
that the approximation factor can be further 
improved to match a lower bound. 

6.1 O (/^-approximation algorithms 
for (k, l)-anonymization 

Let G = (V, E) be the input graph to the weak 
(k, l)-anonymization problem. Consider the fol- 
lowing simple iterative algorithm: at every step i 
add to graph Gi [G\ = G) a single edge between 
a neighbor of a deficient node u and a node that 
is not already in the 2-neighborhood of m in Gj. 
If there are only isolated deficient nodes in Gi, 
the algorithm directly connects a deficient node 
to a node of a {k + l)-clique. If no such clique 
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exists, the algorithm creates it in a preprocess- 
ing step; (k + 1) randomly selected nodes are 
picked for this purpose. Repeat the process un- 
til no deficient nodes remain. We call this algo- 
rithm the Weak-Any algorithm. We show that 
Weak-Any is an 0(fc)-approximation algorithm 
for the weak (k, l)-anonymization problem. This 
result is summarized in the following theorem. 

Theorem 4. Weak-Any gives a 0(k)- 
approximation for the weak (k, l)-anonymization 
problem. If the optimal solution is of size t, 
Weak-Any adds at most 4kt + k 2 edges. 

Proof. Let R = Ylvev r ( v ) be the residual 
anonymity (see Definition 0|) of graph G = 
(V,E). Let Wa be the total number of edges 
added by the Weak-Any algorithm. It holds that 
Wa < R + k 2 . This is because at every step 
the algorithm adds one edge that decreases the 
residual anonymity of the graph by at least 1. 
Therefore the algorithm adds at most R edges. 
The additional k 2 edges may be required to cre- 
ate a (fc + l)-clique if such a clique does not exist. 

Now assume that the optimal solution adds t 
edges. Consider an edge uv of the optimal so- 
lution. This edge, at the time of its addition, 
could have decreased the residual anonymity of 
the graph by at most 4k. This is because it could 
have decreased the residual anonymity of each of 
u and v as well as the residual anonymities of at 
most k neighbors connected to u and at most 
k neighbors connected to v (if u or v had more 
than k neighbors, then none of these neighbors 
would have been deficient). Further, the edge uv 
could have decreased the residual anonymity of 
u or v by at most k, and the residual anonymi- 
ties of each of the k neighbors of u or each of the 
k neighbors of v by at most 1. 

Thus, each edge of the optimal solution could 
have reduced the residual anonymity of the 
graph by at most 4k at the time of its addition. 
That is, t > R/4k. 

Thus it is clear that Wa < 4kt + k 2 . □ 

For the strong (k, l)-anonymization problem 
we show that the Strong-Any algorithm (very 
similar to Weak-Any), is an 0(/c)-approximation. 
Strong-Any is also iterative: in each iteration 
i it considers graph G{ and adds one edge to 



it. The edge to be added is one that connects 
a neighbor of a deficient node u to a node that 
is not already in the 2-neighborhood of u. This 
process is repeated till no deficient nodes remain. 
We can state the following for the approximation 
ratio achieved by the Strong-Any algorithm. 

Theorem 5. Strong-Any is a 2k -approximation 
algorithm for the strong (k,l)-anonymization 
problem. 

Proof. As in the proof of Theorem H] consider 
input graph G = (V, E) with initial resid- 
ual anonymity R. Every edge added by the 
Strong-Any algorithm would reduce the residual 
anonymity of the graph by at least 1. Therefore, 
if the number of edges added by the Strong-Any 
algorithm is Sa we have that Sa < R. 

Suppose now that the optimal solution adds t 
edges. An added edge uv decreases the residual 
anonymity of the graph by at most Ik. This 
is because the edge can decrease the residual 
anonymity of only the original neighbors of u and 
v by at most 1 each and there can be at most Ik 
such deficient neighbors. Thus t > R/2k. 

From the above we have that Sa < 2kt. □ 

6.2 ©(logn)-approximation algorithms 
for (k, l)-anonymization 

We now provide two simple greedy algorithms for 
the weak and strong (k, l)-anonymization prob- 
lems and show that they output solutions that 
are 0(logn)-approximations to the optimal. We 
then show that this is the best approximation 
factor we can hope to achieve for arbitrary k. 

We start by presenting Weak-Greedy which 
is an 0(logn)-approximation algorithm for the 
weak (k, l)-anonymization problem. Consider 
input graph G = (V, E) that has total residual 
anonymity R. The optimal solution to the prob- 
lem consists of a set of edges that together take 
care of all the residual anonymity in the graph. 

We may be tempted to use a set-cover type so- 
lution: greedily choose edges to add that maxi- 
mally reduce the residual anonymity of the graph 
at each step. However, such a greedy algorithm 
is not so easy to analyze in the context of the 
weak-anonymization problem. The difficulty in 
the analysis stems from the fact that the new 
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Figure 3: Illustrative example of the rein- 
forcement between new edges in the weak- 
anonymization problem. 



edges may reinforce each other. That is, the ad- 
dition of an edge may bring about a greater re- 
duction in the residual anonymity of the graph in 
the presence of other added edges. Consider, for 
example, the input graph G shown in Figure El 
Note that solid lines correspond to the original 
edges in G. In this case, the addition of edge 
x2zl alone does not help in the anonymization 
of node y2. (Neither does the addition of edge 
y2zl in the anonymization of x2). However, if 
edge y2zl is already added in the graph, then 
edge x2zl helps in anonymizing node y2 as well. 

We get around this peculiarity of our prob- 
lem by greedily choosing triplets of edges to add 
instead of singleton edges. Algorithm 1, called 
Weak-Greedy, describes the procedure. 

Algorithm 1 Weak-Greedy for weak (k, 1)- 
anonymization 

1: //Input: k,G= (V,E) 

2: Randomly choose a node w € V 

3: Add up to ( k 2 1 ) edges to E to form a k + 1- 
clique at w 

4: Compute R = residual anonymity of G 

5: while R > do 

6: Find triplet uv, uw, vw that maximally de- 
crease R 

7: E = E U {uv} U {uw} U {vw} 
8: Update R 
9: end while 



Theorem 6. Weak-Greedy is a polynomial-time 
nearly O(logn)- approximation algorithm for the 
weak (k, l)-anony- mization problem. If the op- 
timal solution is of size t, the algorithm adds 
k 2 + 6i log n edges. 

Proof. Consider the optimal solution of t edges. 
These t edges together take care of all the resid- 



ual anonymity in the graph. We can convert this 
solution to a solution of triplets that consists of 
at most k 2 + 3t edges: first randomly choose 
a node w and create a (k + l)-clique amongst 
w and k other randomly chosen nodes. Then, 
for each edge uv of the optimal solution, add 
a triangle (uv, vw, uw) to the graph. The re- 
sulting graph will clearly continue to be (k, 1)- 
anonymous. The t triangles in conjunction with 
the (k + l)-clique take care of all the residual 
anonymity in the graph. Further, these triangles 
do not reinforce each other because they are all 
connected to a node of degree k. 

Going back to Algorithm 1, this means that 
once a (fc+l)-clique has been added to the graph, 
at each iteration of the algorithm, there must 
exist some triangle with a vertex in the (k + 1)- 
clique that reduces the residual anonymity of the 
graph by a factor of at least t (similar to the 
argument for the greedy set cover algorithm). 
And since the algorithm greedily chooses tri- 
angles to add, the residual anonymity of the 
graph will decrease by at least this factor at 
each step. Since the residual anonymity of the 
graph can be at most kn < n 2 to begin with, 
the algorithm will only proceed for at most r 
iterations till (1 — l/t) r < 1/kn. This would 
mean that r = 0(t\og(kn)) = 0(2£logn) and 
3r = 0(6ilogn). □ 

The approximation algorithm for the strong 
(k, l)-anony- mization problem is simpler, since 
added edges cannot reinforce each other — an 
added edge can only help the original neighbors 
of its two end points. Algorithm 2 gives the de- 
tails of the Strong-Greedy algorithm. 

Algorithm 2 Strong-Greedy for (k, 1)- 

anonymization 
1: //Input: k,G = (V,E) 
2: Compute R = residual anonymity of G 
3: while R > do 

4: Find edge uv that maximally reduces R 
5: E = EU{uv} 
6: Update R 
7: end while 



Since the added edges do not reinforce each 
other in the strong (k, l)-anonymization prob- 
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lem, the analysis of Strong-Greedy is similar to 
the analysis of the greedy algorithm for the stan- 
dard set-cover problem. 

Theorem 7. Strong-Greedy is a polynomial- 
time 2 log n- approximation algorithm for the 
strong (k, l)-anonymization problem. 

Proof. Suppose the optimal solution adds t 
edges, to reduce the residual anonymity of the 
graph by at most kn < n 2 . Since edges of 
the solution do not reinforce each other, there 
must exist some edge that reduces the residual 
anonymity of the graph by at least a factor of t. 

Therefore at each iteration of Algorithm 2, 
we greedily choose an edge to add that must 
cause at least this much reduction in the resid- 
ual anonymity of the graph. The algorithm will 
thus terminate after r steps where (1 — l/t) r < 
l/(kn), or r = tlog(kn) < 2tlogn. □ 

We next show that log n is the best factor we 
could hope to achieve for unbounded k, for both 
the weak and strong (k, l)-anonymization prob- 
lems via an approximation-preserving reduction 
from the hitting set problem. 

Theorem 8. The weak and strong (k,l)- 
anonymization problems with k unbounded are 
{} (log n)- approximation NP-hard. 

Proof. Hitting set is f2(logn)-approximation 
NP-hard. Consider an instance of the hitting- 
set problem consisting of sets S = {Si, S2, ■ ■ ■}■ 
Let k be greater than the maximum number of 
sets intersecting any one set Si. Add a unique 
element V{ to each S{. Additionally, construct 
sets T = {Ti, T2, . . .} such that each Tj contains 
the appropriate v^s so that every Si intersects 
exactly k — 1 other sets. In every set Tj add an 
additional element w so that each set in T in- 
tersects at least k other sets. Now construct a 
bipartite graph G = (U, V, E), where the vertices 
of U correspond to the sets in S and T, the ver- 
tices of V correspond to individual members of 
these sets, with E indicating membership of ele- 
ments from V in sets from U. For every element 
u in U create (k + 1) new vertices of degree 1 in 
V and connect them to u. In the resulting graph, 
the vertices in V are all (k, l)-anonymous, how- 
ever the vertices in U that correspond to sets in 



S are only (k — 1, l)-anonymous. Consider the t 
nodes in V that are the optimal solution to the 
hitting-set problem. Then matching these nodes 
using \t/2] edges will be an optimal solution to 
the strong or weak (k, l)-anonymization problem 
in the bipartite graph G = (U,V,E). Therefore, 
an optimal solution to the anonymization prob- 
lem corresponds to an optimal solution to the 
hitting-set problem which is r2(logn)-hard to ap- 
proximate. □ 

7 (k, £)-anonymization for £ > 1 

In this section we provide algorithms for the 
weak and strong (k, £)-anonymization problems 
when t > 1. 

The algorithm for weak (k, £)-anonymization 
is a randomized algorithm that constructs a 
bounded-degree expander between deficient ver- 
tices. Given a (k, £')-anonymous graph G, it 
solves the weak (k, ^)-anonymization problem by 
adding only 0(^k — k')i) additional edges at 
each vertex. The algorithm can also be easily 
adapted to solve the weak (k, £)-anonymization 
problem for any input graph irrespective of its 
initial anonymity. 

Theorem 9. There exists a random- 
ized polynomial-time algorithm that adds 
0(\/ (k — k')i) edges per vertex and increases 
the anonymity of a graph from (k',£) to (k,£) 
where £ < k < n 1_e and e is a constant greater 
than 0. 

Proof Sketch: Randomly partition the n 
vertices into n/£ sets of size £. Treat each set as 
a "supernode" . Construct an expander of degree 
y/(k — k')/£ on these n/£ supernodes. In this 
way each supernode has (k — k')£ supernodes in 
its 2-neighborhood that can be reached through 
just one intermediate supernode. Replace each 
edge uv of this expander with a clique 
of edges between the constituent vertices of 
the supernodes u and v. Thus each vertex 
now has k — k' vertices in its 2-neighborhood 
that can be reached through an intermediate 
set of size £. Since I < k < n 1_e , we can 
show that with high probability, none of these 
k — k! new vertices will coincide with the k' 
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vertices previously in the node's 2-neighborhood. 

As a final result, we present the algorithm for 
strong (k, ^-anonymization. This algorithm is a 
generalization of the Strong-Greedy algorithm 
(see Algorithm 2). The difference is that instead 
of picking a single edge to add at every itera- 
tion the algorithm picks edges in groups of size 
at most £. At each iteration it picks the group 
that causes the largest reduction in the resid- 
ual anonymity of the graph. The pseudocode is 
given in Algorithm 3. 



Algorithm 3 Strong-Greedy for (k,£)- 
anonymization 

1: //Input: k,£,G= (V,E) 

2: Compute R = residual anonymity of G 

3: while R > do 

4: Find set of edges £, with \£\ < £, that 

maximally reduces R 
5: E = EU£ 
6: Update R 

7: end while 



We can state the following theorem for the ap- 
proximation factor of Algorithm 3 when I is a 
constant. 

Theorem 10. Consider G = (V, E) to be the 

input graph to the strong (k,£)- anonymization 
problem. Also assume £ is a constant. Let t be 
the optimal number of edges that need to be added 
to solve the strong (k,£)- anonymization problem 
on G. Then Algorithm 3 is a polynomial-time 
0(i^ _1 log n)- approximation algorithm. 

Proof Sketch: In the (k,£)- anonymization 
problem, groups of up to £ edges at a time in- 
cident at a single vertex can reduce the residual 
anonymity of a vertex adjacent to the £ endpoints 
of these edges. The t edges added by the opti- 
mal solution define at most t l subsets of at most 
£ edges incident to a single vertex. By selecting 
such subsets greedily as in a set-cover problem we 
ultimately reduce the residual anonymity of the 
graph to in 0(t e log n) steps. We can show that 
reinforcement effects between subsets of edges 
are taken care of. This proves the 0(t logn) 
bound on the number of edges selected. If £ is 



a small constant, the approximation factor may 
not be too large. Further, in practice this simple 
algorithm may perform better than this worst 
case bound indicates. 

8 Conclusions 

Motivated by recent studies on privacy- 
preserving graph releases, we proposed a new 
definition of anonymity in graphs. We further 
defined two new combinatorial problems aris- 
ing from this definition, studied their complexity 
and proposed simple, efficient and intuitive algo- 
rithms for solving them. 

The key idea behind our anonymization 
scheme was to enforce that every node in the 
graph should share some number of its neigh- 
bors with k other nodes. The optimization prob- 
lems we defined ask for the minimum number of 
edges to be added to the input graph so that 
the anonymization requirement is satisfied. For 
these optimization problems we provided algo- 
rithms that solve them exactly (k = 2) or ap- 
proximately (k > 2). 

An interesting avenue for future work would be 
to fully characterize the kinds of attacks that our 
definition of anonymity protects against, and to 
study the impact of our anonymization schemes 
on the utility of the graph release. 

Finally, we believe that the combinatorial 
problems we have studied in this paper are in- 
teresting in their own right, and may also prove 
useful in other domains. For example, at a high 
level there is a similarity between the problem we 
study in this paper and the problem of construct- 
ing reliable graphs for, say, reliable routing. 
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